Goto

Collaborating Authors

 Hangzhou


Distilling Multi-view Diffusion Models into 3D Generators

arXiv.org Artificial Intelligence

--We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous amortized optimization approaches, we align the MV-DM and 3D generator representation spaces to transfer the teacher's probabilistic flow to the student, thus avoiding inconsistencies in optimization objectives caused by probabilistic sampling. The introduction of probabilistic flow and the coupling of various attributes in 3D Gaussians introduce challenges in the generation process. T o tackle this, we propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow and converts a single image into 3D Gaussians within 0.06 seconds. Furthermore, to reduce knowledge loss and overcome sparse-view supervision, we design a joint optimization objective that ensures the quality of generated samples through explicit supervision and implicit verification. Leveraging existing 2D generation models, we compile 120k high-quality RGBA images for distillation. Experiments on synthetic and public datasets demonstrate the effectiveness of our method. Our project is available at: https://qinbaigao.github.io/DD3G I NTRODUCTION W ITH the rapid development of 2D-AIGC [1] and 3D Gaussian Splatting [2] technologies, there is a significant opportunity for the automated generation of 3D assets from a single image. However, a key challenge that has * Corresponding author. Hao Qin, Ming Kong, Mengxu Lu, and Qiang Zhu are with School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China (e-mail: haoqin@zju.edu.cn;


Multi-agent Uncertainty-Aware Pessimistic Model-Based Reinforcement Learning for Connected Autonomous Vehicles

arXiv.org Artificial Intelligence

Abstract--Deep Reinforcement Learning (DRL) holds significant promise for achieving human-like Autonomous Vehicle (AV) capabilities, but suffers from low sample efficiency and challenges in reward design. Model-Based Reinforcement Learning (MBRL) offers improved sample efficiency and generalizability compared to Model-Free Reinforcement Learning (MFRL) in various multi-agent decision-making scenarios. Nevertheless, MBRL faces critical difficulties in estimating uncertainty during the model learning phase, thereby limiting its scalability and applicability in real-world scenarios. Additionally, most Connected Autonomous Vehicle (CAV) studies focus on single-agent decision-making, while existing multi-agent MBRL solutions lack computationally tractable algorithms with Probably Approximately Correct (P AC) guarantees, an essential factor for ensuring policy reliability with limited training data. T o address these challenges, we propose MA-PMBRL, a novel Multi-Agent Pessimistic Model-Based Reinforcement Learning framework for CAVs, incorporating a max-min optimization approach to enhance robustness and decision-making. T o mitigate the inherent subjectivity of uncertainty estimation in MBRL and avoid incurring catastrophic failures in AV, MA-PMBRL employs a pessimistic optimization framework combined with Projected Gradient Descent (PGD) for both model and policy learning. MA-PMBRL also employs general function approximations under partial dataset coverage to enhance learning efficiency and system-level performance. By bounding the suboptimality of the resulting policy under mild theoretical assumptions, we successfully establish P AC guarantees for MA-PMBRL, demonstrating that the proposed framework represents a significant step toward scalable, efficient, and reliable multi-agent decision-making for CAVs. Multi-Agent Reinforcement Learning (MARL) has emerged as a promising approach for enabling CA Vs to execute complex tasks autonomously . R. Wen and R. Li are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310058, China (email: {wenruoqi, lirongpeng }@zju.edu.cn). X. Xu is with the Information and Communication Branch of State Grid Hebei Electric Power Co., Ltd, China (e-mail:hsuxing@zju.edu.cn). Z. Zhao is with Zhejiang Lab, Hangzhou 311121, China, and also with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310058, China (email: zhaozf@zhejianglab.com). However, the costly requirement for sufficient data through extensive real-world interactions makes MFRL stuck in unstable learning and high computational overhead, thus making it less competent in autonomous driving scenarios.


A Layer-Wise Natural Gradient Optimizer for Training Deep Neural Networks Ant Group Hangzhou, China

Neural Information Processing Systems

Second-order optimization algorithms, such as the Newton method and the natural gradient descent (NGD) method exhibit excellent convergence properties for training deep neural networks, but the high computational cost limits its practical application. In this paper, we focus on the NGD method and propose a novel layerwise natural gradient descent (LNGD) method to further reduce computational costs and accelerate the training process. Specifically, based on the block diagonal approximation of the Fisher information matrix, we first propose the layer-wise sample method to compute each block matrix without performing a complete backpropagation. Then, each block matrix is approximated as a Kronecker product of two smaller matrices, one of which is a diagonal matrix, while keeping the traces equal before and after approximation. By these two steps, we provide a new approximation for the Fisher information matrix, which can effectively reduce the computational cost while preserving the main information of each block matrix. Moreover, we propose a new adaptive layer-wise learning rate to further accelerate training. Based on these new approaches, we propose the LNGD optimizer. The global convergence analysis of LNGD is established under some assumptions. Experiments on image classification and machine translation tasks show that our method is quite competitive compared to the state-of-the-art methods.


Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter

arXiv.org Artificial Intelligence

--We study the task of language-conditioned pick and place in clutter, where a robot should grasp a target object in open clutter and move it to a specified place. Some approaches learn end-to-end policies with features from vision foundation models, requiring large datasets. Others combine foundation models in a zero-shot setting, suffering from cascading errors. In this paper, we aim to develop an effective policy by integrating foundation priors from vision, language, and action. The alignment formulation enables our policy to train with less data and preserve zero-shot generalization capabilities. We show that a shared policy for both pick and place actions enhances the performance for each task, and introduce a policy adaptation scheme to accommodate the multi-modal nature of actions. Extensive experiments in simulation and the real-world show that our policy achieves higher task success rates with fewer steps for both pick and place tasks in clutter, effectively generalizing to unseen objects and language instructions. Videos and codes are available at the project page. HE ability to pick and place objects is essential for robotic manipulation [1]-[6]. Consider a scenario where a robot is commanded with language instructions to grasp a target object in open clutter, and move it to a specified place. The target object may be partially or fully occluded, posing challenges for object grounding and grasping. In such scenarios, multiple pick and place actions may be needed to clear obstacles for object rearrangement. A common way to construct a policy for such tasks is to predict 6-DoF actions directly from raw sensory information, as in classic end-to-end policies. Recently, these policies have achieved promising performances by incorporating features of pre-trained foundation models, e.g., vision-language models (VLM) and large language models (LLM) [7]-[12]. However, they require large amounts of demonstration data for policy learning, particularly for tasks involving cluttered environments. In addition, one has to deal with generalization issues to deploy these policies in real-world applications. Kechun Xu is with Zhejiang University, Hangzhou, China, and Alibaba Cloud, Hangzhou, China. Xunlong Xia, and Bing Deng are with Alibaba Cloud, Hangzhou, China. Kaixuan Wang, Yifei Y ang, Y unxuan Mao, Rong Xiong, and Y ue Wang are with Zhejiang University, Hangzhou, China.


DeepSeek rushes to launch new AI model as China goes all in

The Japan Times

DeepSeek is looking to press home its advantage. The Chinese startup triggered a 1 trillion-plus sell-off in global equities markets last month with a cut-price AI reasoning model that outperformed many Western competitors. Now, the Hangzhou-based firm is accelerating the launch of the successor to January's R1 model, according to three people familiar with the company. Deepseek had planned to release R2 in early May but now wants it out as early as possible, two of them said, without providing specifics. The company says it hopes the new model will produce better coding and be able to reason in languages beyond English.


South Korea removes DeepSeek from app stores pending privacy review

Al Jazeera

South Korea has suspended downloads of DeepSeek's artificial intelligence-powered chatbot pending a review of the Chinese start-up's privacy standards. South Korea's privacy watchdog said on Monday that DeepSeek's R1 chatbot was removed from the local versions of Apple's App Store and Google Play after the Hangzhou-based firm acknowledged that it had failed to comply with personal data protection rules. The Personal Information Protection Commission said in a statement that DeepSeek accepted its proposal to suspend downloads of the app. The chatbot is still available for those who have already downloaded the app. "To prevent further concerns from spreading, the commission recommended that DeepSeek temporarily suspend its service while making the necessary improvements," the commission said, adding that bringing the app in line with local regulations would "inevitably take a significant amount of time".


What to Know About DeepSeek, the Chinese AI Company Causing Stock Market Chaos

TIME - Tech

A new Chinese AI model, created by the Hangzhou-based startup DeepSeek, has stunned the American AI industry by outperforming some of OpenAI's leading models, displacing ChatGPT at the top of the iOS app store, and usurping Meta as the leading purveyor of so-called open source AI tools. All of which has raised a critical question: despite American sanctions on Beijing's ability to access advanced semiconductors, is China catching up with the U.S. in the global AI race? At a supposed cost of just 6 million to train, DeepSeek's new R1 model, released last week, was able to match the performance on several math and reasoning metrics by OpenAI's o1 model โ€“ the outcome of tens of billions of dollars in investment by OpenAI and its patron Microsoft. The Chinese model is also cheaper for users. The upshot: the U.S. tech industry is suddenly faced with a potentially cheaper and more powerful challenger, unnerving investors, who sold off American tech stocks on Monday morning.


Why is the AI world freaking out over China's DeepSeek?

The Japan Times

DeepSeek, an AI startup just over a year old, has stirred awe and consternation in Silicon Valley with its breakthrough artificial intelligence model that offers comparable performance to the world's best chatbots at seemingly a fraction of the cost. Created in China's Hangzhou, DeepSeek carries far-reaching implications for the global tech industry and supply chain, offering a counterpoint to the widespread belief that the future of AI will require ever-increasing amounts of power and energy to develop.


Flying drone can roll on the ground to save energy over long distances

New Scientist

An autonomous drone with wheels can roll along the ground, only flying when needed to clear obstacles, which helps its battery last seven times longer, according to its developers. Rolling robots are efficient and can travel long distances, but cannot traverse big obstacles, while flying drones can get past large obstructions, but have limited range.


How Shady Chinese Encryption Chips Got Into the Navy, NATO, and NASA

WIRED

From TikTok to Huawei routers to DJI drones, rising tensions between China and the US have made Americans--and the US government--increasingly wary of Chinese-owned technologies. But thanks to the complexity of the hardware supply chain, encryption chips sold by the subsidiary of a company specifically flagged in warnings from the US Department of Commerce for its ties to the Chinese military have found their way into the storage hardware of military and intelligence networks across the West. In July of 2021, the Commerce Department's Bureau of Industry and Security added the Hangzhou, China-based encryption chip manufacturer Hualan Microelectronics, also known as Sage Microelectronics, to its so-called "Entity List," a vaguely named trade restrictions list that highlights companies "acting contrary to the foreign policy interests of the United States." Specifically, the bureau noted that Hualan had been added to the list for "acquiring and ... attempting to acquire US-origin items in support of military modernization for [China's] People's Liberation Army." Yet nearly two years later, Hualan--and in particular its subsidiary known as Initio, a company originally headquartered in Taiwan that it acquired in 2016--still supplies encryption microcontroller chips to Western manufacturers of encrypted hard drives, including several that list as customers on their websites Western governments' aerospace, military, and intelligence agencies: NASA, NATO, and the US and UK militaries.